Introduction

In many biomedical and social science settings, observations are clustered in units, and their outcomes may be correclated with one another. For example, ____. This clustering complicates the analysis and violates the traditional regression assumption that all observations are independent. For cluster-randomized trials (CRTs) with continuous outcomes, linear mixed models (LMMs), which assume that observations are independent conditional on cluster membership, are the preferred approach to the data analysis. CRTs are a widely used experimental design [cite several examples] and can flexibly accommodate nested levels of clustering, differing cluster sizes, and are robust to some types of missing data patterns, making them an attractive option for data analysts. Generalized linear mixed models (GLMMs) extend the approach to binary, count, and categorical outcome variables.

While it is possible that observations within a cluster will not have any correlation with each other, it is still recommended that a random intercept for each cluster is included, since it is not known ahead of time [citation from textbook or whatever]. Furthermore, since the amount of within-cluster correlation is estimated during the model fitting process, correlations of zero can result, making inferences identical to those from an ordinary linear model.

Given the popularity and flexibility of (G)LMMs, they have been used in many different contexts and with widely varying sample sizes. [Reference earlier work about common cluster sizes and # of clusters.] Small sample studies are not uncommon [cite a few examples], with as few as ___ observations per cluster and ___ clusters total [CITATION].

Inferences from (G)LMMs with small sample sizes can be problematic. [List various facts about small sample performance with appropriate citations.] Experimental results from simulation studies have shown , , and ___, with appropriate citations. Or lack of experimental results.

Our original goal was to study the impact on Type I error (TIE) rates of allowing the cluster-level variance to differ by treatment arm in a CRT, but along the way we discovered that in many cases the TIE rates were not close to their nominal level in either case. This work examines the realized TIE rates in a simulation study, showing that major software packages

Design

We generated clustered data from the null model

\[ y_{ij} = b_{0i} + e_{ij} \]

for clusters \(i = 1, 2, ..., K\) and individuals \(j = 1, 2, ..., N\) within each cluster. The random intercept \(b_{0i}\) for cluster \(i\) was distributed \(\sim N(0, \sigma_b^2)\), and the residual error term \(e_{ij} \sim N(0, \sigma^2)\). We then fit a model that assumed the clusters were evenly divided into two arms \(x \in (0, 1)\) for treatment and control, allowing for clustering:

\[ y_{ij} = \beta_0 + \beta_1 x_{ij} + b_{0i} + e_{ij} \]

Assuming an \(\alpha\) level of \(.05\), We gathered p-values for the \(\beta_1\) coefficients and, since the data-generating mechanism had a true \(\beta_1\) value of zero, we compared the TIE rate to the nominal \(\alpha = .05\) level.

We performed our analysis on all combinations of the following data-generating parameters:

  • total number of clusters \(K\in \{10, 20, 40, 100\}\), divided evenly among the two treatment arms

  • subjects per cluster \(N \in \{ 3, 5, 10, 20, 50\}\)

  • \(\sigma_b^2 \in \{0.001, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5\}\)

  • \(\sigma^2 \in \{0.1, 1, 10\}\)

For each combination, we performed 10,000 simulations, and gathered p-values for \(\beta_1\) in two different ways. First, we used model-based Wald test p-values from SAS PROC MIXED and the lmerTest (after fitting models using lme4) and nlme packages in R, fitting with restricted maximum likelihood (REML). Second, we generated p-values from the same packages using the likelihood ratio test, this time fitting with maximum likelihood (ML).

Talk about DF choices for lmerTest, nlme, and SAS.

Finally, we also used the fill in stuff here about SAS KR correction

Results

-As many non-repetitive graphs of same result as possible. Might do nclust/nsub on axes, or maybe sigb/sig. Or something else. Heat map?

Other possible ways of looking at the data

Discussion